-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tile() function part 0.5 #8
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great but I lack background on the tiles and register matrices, so nothing really deep from me this time. Mostly questions out of curiosity :)
template <typename ET, StorageOrder SO, typename FF, typename FP> | ||
BLAST_ALWAYS_INLINE void tile(std::size_t m, std::size_t n, FF&& f_full, FP&& f_partial) | ||
{ | ||
size_t constexpr TILE_SIZE = TileSize_v<ET>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we have matrix of integers, booleans or complex numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think it will not compile, because we don't have TileSize_v<ET>
defined for those types.
{ | ||
size_t i = 0; | ||
|
||
// i + 4 * TILE_SIZE != M is to improve performance in case when the remaining number of rows is 4 * TILE_SIZE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit magic for me, can't really understand why is it more efficient :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where magic 3 and 4 come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic nombers come from a specific case when you have 16 AVX registers, each storing 4 double
s, and TILE_SIZE == 4
. When you have 16
rows left, it is more efficient (based on performance test) to apply a 8
-row kernel 2 times than a 12
-row kernel and then a 4
-row kernel.
This code is tied to a specific architecture and is actually very old. It should be re-written in more general way.
// it is more efficient to apply 2 * TILE_SIZE kernel 2 times than 3 * TILE_SIZE + 1 * TILE_SIZE kernel. | ||
for (; i + 3 * TILE_SIZE <= m && i + 4 * TILE_SIZE != m; i += 3 * TILE_SIZE) | ||
{ | ||
RegisterMatrix<ET, 3 * TILE_SIZE, TILE_SIZE, columnMajor> ker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why columnMajor
here and not SO?
for (; i + 2 * TILE_SIZE <= m; i += 2 * TILE_SIZE) | ||
{ | ||
RegisterMatrix<ET, 2 * TILE_SIZE, TILE_SIZE, columnMajor> ker; | ||
f_full(ker, i, j); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Мне тут пояснительная бригада нужна, зачем мы применяем функтор к пустой матрице? А, я понял, это чтобы этими черипичками покрыть в случае четного и нечетного количества строк? Или нет?
if (i < m) | ||
{ | ||
RegisterMatrix<ET, TILE_SIZE, TILE_SIZE, columnMajor> ker; | ||
f_partial(ker, i, j, m - i, ker.columns()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TILE_SIZE != ker.columns()
?
@@ -51,83 +54,25 @@ namespace blast | |||
MatrixPointer<MPC> && StorageOrder_v<MPC> == columnMajor && | |||
MatrixPointer<MPD> && StorageOrder_v<MPD> == columnMajor | |||
) | |||
BLAZE_ALWAYS_INLINE void gemm(size_t M, size_t N, size_t K, ST1 alpha, MPA A, MPB B, ST2 beta, MPC C, MPD D) | |||
BLAST_ALWAYS_INLINE void gemm(size_t M, size_t N, size_t K, ST1 alpha, MPA A, MPB B, ST2 beta, MPC C, MPD D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, do we need to specify M, N, K
? Are not they part of MPA, MPB, MPC
?
{ | ||
using ET = std::remove_cv_t<ElementType_t<MPA>>; | ||
using ET = std::remove_cv_t<ElementType_t<MPD>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make a difference?
{ | ||
using ET = std::remove_cv_t<ElementType_t<MPA>>; | ||
using ET = std::remove_cv_t<ElementType_t<MPD>>; | ||
size_t constexpr TILE_SIZE = TileSize_v<ET>; | ||
|
||
BLAZE_CONSTRAINT_MUST_BE_SAME_TYPE(std::remove_cv_t<ElementType_t<MPB>>, ET); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to put in require
above?
Continuous integration broken because of a bug in Blaze: https://bitbucket.org/blaze-lib/blaze/commits/6058199308a8c741279373b4d81debbba5e5e850#comment-14858242 |
Blaze bug fixed https://bitbucket.org/blaze-lib/blaze/issues/451/error-compiling-bb-inv-trans-ll-for#comment-66451923, CI passes. |
Partial commit for #5